External Validation of a Predictive Model for Acute Skin Radiation Toxicity in the REQUITE Breast Cohort

Background: Acute skin toxicity is a common and usually transient side-effect of breast radiotherapy although, if sufficiently severe, it can affect breast cosmesis, aftercare costs and the patient's quality-of-life. The aim of this study was to develop predictive models for acute skin toxicity using published risk factors and externally validate the models in patients recruited into the prospective multi-center REQUITE (validating pREdictive models and biomarkers of radiotherapy toxicity to reduce side-effects and improve QUalITy of lifE in cancer survivors) study. Methods: Patient and treatment-related risk factors significantly associated with acute breast radiation toxicity on multivariate analysis were identified in the literature. These predictors were used to develop risk models for acute erythema and acute desquamation (skin loss) in three Radiogenomics Consortium cohorts of patients treated by breast-conserving surgery and whole breast external beam radiotherapy (n = 2,031). The models were externally validated in the REQUITE breast cancer cohort (n = 2,057). Results: The final risk model for acute erythema included BMI, breast size, hypo-fractionation, boost, tamoxifen use and smoking status. This model was validated in REQUITE with moderate discrimination (AUC 0.65), calibration and agreement between predicted and observed toxicity (Brier score 0.17). The risk model for acute desquamation, excluding the predictor tamoxifen use, failed to validate in the REQUITE cohort. Conclusions: While most published prediction research in the field has focused on model development, this study reports successful external validation of a predictive model using clinical risk factors for acute erythema following radiotherapy after breast-conserving surgery. This model retained discriminatory power but will benefit from further re-calibration. A similar model to predict acute desquamation failed to validate in the REQUITE cohort. Future improvements and more accurate predictions are expected through the addition of genetic markers and application of other modeling and machine learning techniques.


INTRODUCTION
Survivorship issues and quality-of-life (QoL) are becoming an increasingly important research focus in cancer care (1). Breast cancer survival has improved markedly, with current predicted 10-year survival rates in excess of 80% (2). Over 70% of breast cancer patients undergo radiotherapy, usually in the adjuvant setting following surgery. Radiotherapy reduces the risk of local recurrence and contributes to a reduction in overall mortality (3). Nevertheless, breast radiotherapy can be associated with several side-effects (toxicity). Acute (or early) toxicity includes breast erythema (reddening) and desquamation (skin loss) and occurs within 90 days of treatment (4). While late side-effects of radiotherapy are concerning due to their potential irreversibility, acute toxicity may cause considerable patient morbidity and can have adverse effects on the cosmetic outcome from oncoplastic breast surgery and reconstruction (5,6). There is some evidence that if sufficiently severe, early toxicity can be associated with clinically significant late toxicity (7). Invariably, surgeons' treatment recommendations are influenced by their perception of potential adjuvant treatment complications such as from radiotherapy (8,9). Nevertheless, there is considerable variation between individual patients' normal tissue reaction to radiotherapy. Being able to stratify individual patients according to their risk of radiation toxicity would enable breast surgeons to take this information into account when advising patients about the risks and benefits of different surgical treatment options, or even suggest a change to the sequence of surgery and adjuvant treatment including radiotherapy (10).
In the field of medical physics, many radiobiological predictive models have been proposed with the aim of preserving normal tissue, mostly focused on late toxicity. Normal tissue complication probability (NTCP) models, such as the Lyman-Kutcher-Burman (LKB) model, incorporate the linear quadratic (LQ) model of cell killing (11,12). Many of these dosimetric models have already been integrated into radiotherapy treatment planning systems. They generally take the form of simplified empirical models consisting of dose distribution parameters, and the risk of toxicity is assumed to depend on the mean dose to the respective target organ or the amount of damaged tissue (13).
In prostate radiotherapy, it has been shown that dosimetric models for late rectal toxicity can be improved by including clinical and other treatment risk factors, such as prior abdominal surgery, colorectal disease and diabetes (14,15). In breast radiotherapy, several studies have investigated the association of clinical and treatment risk factors with acute skin toxicity, although none have reported a clinical prediction model as such (16)(17)(18)(19)(20)(21)(22)(23)(24)(25). Integrated clinical prediction models capable of identifying patients at risk of clinically significant sideeffects have now been developed in different disease sites, the majority predicting late toxicity with moderate performance (AUC ranging from 0.60 to 0.75) (26)(27)(28). There are also an increasing number of published models predicting acute toxicity, although none for breast radiotherapy (29)(30)(31).
For surgeons and other clinicians, models that include common clinical and treatment predictors are of particular interest because this obviates the need for detailed patient dosimetry and dose-volume histograms from radiotherapy planning scans. It would allow clinicians to estimate toxicity risk at the time of breast cancer diagnosis and before any treatment is planned. In breast reconstruction surgery, a small number of clinical risk models for various 30-day complications have been published (32,33), of which some have been validated for select endpoints (34,35). However, these models are chiefly designed to predict surgical side-effects, such as implant loss, surgical site infection and seroma, and include radiotherapy as a binary predictor variable only.
In the absence of an available prediction model in the literature, the aim of this study was to develop and externally validate predictive models for acute breast radiation toxicity in the REQUITE study breast cohort using published clinical and treatment predictors of acute skin toxicity in the REQUITE study breast cohort.

METHODS
This study was designed using data from patients who underwent breast-conserving surgery (BCS) and adjuvant external beam radiotherapy (EBRT) enrolled in three Radiogenomics Consortium (RGC) studies and the REQUITE cohort study. Candidate variables associated with acute breast radiation toxicity were identified from the existing literature. In the absence of predictive models in the literature, predictive models for acute radiation toxicity endpoints were first developed in combined RGC patient cohorts, then validated in the REQUITE patient cohort. This was a TRIPOD type 3 study, representing model development and validation using a separate dataset (36).

Model Development Cohorts
The German ISE cohort (16) included 478 breast cancer patients treated with conventional 3D conformal whole breast EBRT plus either photon or electron tumor bed boost (except for 19 patients) recruited into a prospective patient cohort at four centers in Southwest Germany between 1998 and 2001, with documented acute radiotherapy toxicity at baseline, at cumulative doses of 36-42 Gy and 44-50 Gy, at the end of radiotherapy, and 6 weeks following radiotherapy. None of the patients in ISE received chemotherapy. All patients from the ISE cohort were included in this study. The ISE study was approved by the Ethical Committee at the University of Heidelberg, Germany (reference No. 37/98).
The LeND cohort (37) consists of 633 breast cancer patients treated with conventional 3D conformal whole breast EBRT using tangential fields and documented normal tissue toxicity recruited at varying time points (up to several years) after breast radiotherapy ± boost in Leicester, Nottingham and Derby (UK) between 2008 and 2010. Acute toxicity was collected from medical records. After excluding the first 154 patients without data on acute toxicity, and 119 patients who had chest wall radiotherapy following mastectomy, 390 patients treated with EBRT following BCS from the LeND cohort were included in this study. The LeND study was approved by the Research Ethics Committee (reference no. 08/H0405/57).
The Cambridge cohort (19) comprised 1,144 women who received adjuvant whole breast EBRT following BCS as part of the Cambridge IMRT trial (UK) following the standard hypofractionated regimen (40 Gy in 15 fractions), 411 of whom were randomized to manual forward-planned intensity-modulated radiotherapy (IMRT) to improve dose homogeneity (reduce the volumes receiving >107 and <95% of the prescribed dose) in the irradiated breast. The remainder of patients were treated with 3D-conformal radiotherapy using wedged tangential fields. Toxicity was documented weekly during treatment according to the RTOG scale. All patients from the Cambridge cohort were included. The study was approved by the Cambridge Research Ethics Committee and written consent was obtained from all patients to use their data for research purposes.

Validation Cohort
The multicenter REQUITE breast cancer patient cohort was recruited prospectively in seven European countries and the USA between 2014 and 2016. The REQUITE study was conceived as an international multicenter validation cohort for predictive models of radiation toxicity with standardized prospective data collection (38). Patient baseline characteristics and methodology have been described in detail elsewhere (39). All 2,057 enrolled patients were treated with BCS followed by EBRT according to local protocol, approximately half of whom were treated with IMRT, with a lower proportion in France and no IMRT at Italian or US centers. The majority patients received a tumorbed boost (64%), ranging from <20% at the French, Italian and Spanish centers to over 80% at the Belgian center, given either simultaneously (n = 257) or sequentially (n = 1,138). Patients with invasive breast cancer in Belgium and the UK were treated using the START-B hypofractionated regimen. Although late toxicity was the main endpoint in REQUITE, data collected at the end of radiation treatment was used to document acute toxicity. All patients gave written informed consent. The study was approved by local ethics committees in participating countries (UK NRES Approval 14/NW/0035) and registered at http://www. controlled-trials.com (ISRCTN98496463). Characteristics of all cohorts included in this study are summarized in Table 1.

Endpoint Definition
Radiation toxicity in REQUITE was scored using CTCAE (Common Terminology Criteria for Adverse Events; Table 2) v4.0 (40). CTCAE v4.0 has separate scales for radiation dermatitis (erythema) and skin ulceration (desquamation), both of which are relevant to the acute response to radiotherapy in the breast. For both LeND and Cambridge cohorts, acute skin toxicity was scored according to the RTOG (Radiation Therapy Oncology Group; Table 2) scale, which is mostly based on target organ or body region (e.g., larynx, upper GI, skin) (41). The German ISE study used a modified version of the Common Toxicity Criteria (CTCAE v2.0) scale for erythema, where grade 2 was subdivided into three sub-grades, with 2c being defined as ≥1 moist desquamation or interruption of treatment due to sideeffects and grade 2a and 2b comprising moderate and brisk erythema, respectively.
This raised the issue of how to deal with the use of different toxicity scales and assessment time points in the previously assembled cohorts and the REQUITE validation cohort. Where multiple measurements were available, maximum recorded toxicity was used. To ensure comparability with previous studies, the following endpoints were considered where they occurred within 90 days of the start of treatment (acute toxicity) according to the different grading systems:  a) Acute erythema: RTOG or CTCAE grade≥2 (at least moderate to brisk erythema); b) Acute desquamation: RTOG grade≥2b (patchy moist desquamation) or CTCAE grade ≥2c erythema (moist desquamation) or CTCAE grade≥1 skin ulceration, implying that skin integrity has been broken, either over the breast or in the infra-mammary fold.

Selection and Definition of Candidate Predictors
The literature was searched through Medline using the MeSH keywords "radiation injury, " "breast neoplasm, " "radiotherapy, " "radiation tolerance, " and "risk factors, " and through PubMed using keywords "radiation injury, " "normal radiation toxicity, " "acute, " "radiotherapy, " "breast cancer, " "radiosensitivity" and "risk factor" or "predictor" or "radiogenomics." Reference lists from identified papers or review articles were also searched. Candidate predictor variables in the literature were considered for validation where their association with acute breast radiation toxicity endpoints on multivariate analysis was reported in at least one publication.
To ensure comparability of measures of breast size, such as breast diameter or bra size, these were converted to a single continuous variable for the purpose of this study by adding bra cup and band sizes, to represent "sister" sizes equal to the same breast volume (according to http://www.sizechart.com/brasize/ sistersize/index.html). For instance, a UK size 34B bra holds an approximate breast volume equal to 32C, ∼390 cc. In the Cambridge trial cohort, breast size was graded as a categorical variable and converted accordingly.
For each patient in the REQUITE and the other three RGC cohorts, information comprising candidate predictor variables and relevant study endpoints were extracted from the data. Hypertension was not recorded in the Cambridge trial cohort, and post-operative infection was not available in the LeND and ISE cohorts. Observations on body mass index (BMI) and breast size were missing from 22 and 17% of patients, respectively, in the three combined RGC cohorts, while information on the remaining candidate predictor variables was missing in between 0.5 and 3% of patients across all cohorts.

Statistical Methods
Both endpoints were considered as dichotomized (binary) outcome measures. Where a patient had multiple measures of acute toxicity within the specified time period, the maximum grade of toxicity recorded was used. Cases with high baseline toxicity defined as grade≥2 were excluded from the analysis. Statistical analyses were carried out in Stata TM version 15.1. Continuous variables are presented as medians (with ranges), and categorical/binary variables as counts and percentages.
In order to minimize bias from analyzing only complete cases, multiple imputation (MI) was used to replace missing values by means of a chained equation approach based on all candidate predictors excluding hypertension (42). Ten imputed datasets were created for missing variables and then combined across all datasets using Rubin's rule to obtain final estimates (43). The number of imputations (m = 10) was determined by the percentage of incomplete observations per variable to reduce the error associated with estimating the regression coefficients, standard errors and the resulting p-values (44). On the basis of an estimated 900 cases of acute erythema and 175 cases of acute desquamation in the three combined RGC cohorts, the consideration of nine candidate predictor variables in this analysis satisfied the methodological constraint of at least 10 events per variable (EPV) required to reduce issues with overfitting in predictive modeling (45).
To develop clinical prediction models, a generalized linear mixed model (GLMM, xtlogit) was fitted in the original dataset combining three RGC derivation cohorts to model the probability of each toxicity endpoint. GLMMs are an extension of mixed models and generalized linear models (GLMs) to allow for inclusion of both fixed and random effects across different study cohorts or cohorts enrolling at multiple centers. Like GLMs, a link function is applied, such as the logit link. Initially, a full model comprising all included predictor variables was fitted, followed by stepwise backwards elimination to select the candidate variables to include in the final prediction model (with p < 0.1 taken conservatively to warrant inclusion). After elimination, each excluded predictor was re-inserted into the final model to further check whether they became statistically significant at this stage.
The equation for the log odds for each acute breast endpoint was formed using the estimated β coefficients multiplied by the predictors included in the model together with the intercept across cohorts. The predicted risk of toxicity can thus be calculated: predicted risk = e log odds 1 + e log odds Discrimination of the fitted models was assessed by calculating the c-statistic (AUC from the logistic model, plotting sensitivity over 1-specificity) and examining the calibration plot across tenths of predicted risk. A c-statistic of 1 indicates perfect discrimination, whereas 0.5 indicates no discrimination. A calibration slope of 1 indicates perfect calibration and would be expected across the original datasets as the model is being developed in the same data (apparent performance).
To control for optimism (over-fitting), the model development process was repeated in 100 bootstrap samples. Each model was applied to the same bootstrap sample to quantify apparent performance, and then to the original dataset to evaluate test performance (c-statistic and calibration slope) and optimism (difference in test performance and apparent performance). To estimate overall optimism, the average calibration slope across all bootstrap samples was calculated and multiplied as a shrinkage coefficient by each variable's β coefficient and the intercept of the model derived in the original dataset to produce a final model for each toxicity endpoint.
The final models were applied to patients in the REQUITE validation cohort to predict the log odds of acute erythema or acute desquamation based on the presence or absence of one or more of the predictor variables. In this external validation step, the intercept of each final model was re-calibrated by subtracting the estimated intercept of the model in the REQUITE validation cohort. Performance of the model in the validation cohort was again assessed by calculating the c-statistic (AUC) and examining the calibration plot across tenths of predicted risk. Overall accuracy was measured by calculating a Brier score, which is the sum of mean square errors between predicted risk and observed outcome for each patient, with a zero score indicating total accuracy.

RESULTS
The literature search identified 10 studies of between 200 and 1,124 patients examining the association of acute breast radiation toxicity with predictor variables. Most studies reported acute skin toxicity scored according to RTOG, while only two studies used the CTCAE erythema scale (16,25). Depending on the published study, the variables associated significantly with toxicity on multivariate analysis were: age (50 and over, dichotomized), body mass index (BMI), breast size or volume, fractionation schedule (hypo-vs. conventional fractionation, dichotomized), use of boost, smoking (ever smoked), and tamoxifen use (see Table 3). Chemotherapy showed significant effects but in opposite directions in two studies (19,46) and was excluded. Interestingly, breast dose was not assessed as a continuous variable in any publication. Hypertension and diabetes were not significant on multivariate analysis in any study. One study used ordinal regression and did not report odds ratios, as endpoints were not dichotomized (20).

Predictive Clinical Model Development and External Validation
The distribution of patients across the three RGC development cohorts and the REQUITE validation cohort according to endpoint is shown in Table 4. Across the RGC cohorts (n = 2.031), there were 914 events of acute erythema (grade ≥2, 45.0%) and 175 events of acute desquamation (8.7%). It was noted that the incidence of desquamation was lower in the Cambridge IMRT cohort. In the REQUITE validation cohort, there were 1,969 and 2,057 patient datasets available for the endpoints acute erythema and acute desquamation, respectively. There were 450 patients with acute erythema (grade ≥2 erythema, 22.9%), and 192 patients with acute desquamation (grade 1≥ ulceration or grade ≥3 erythema, 9.3%).
Further detail regarding the distribution of clinical predictors in each cohort is available in Table 1. Median patient age in the REQUITE breast cohort was 58 years (range 23-80 years), similar to the RGC cohorts. REQUITE patients were treated with a median dose to the breast of 50 Gy (28.5-56 Gy) in 25 fractions (5-31), which is similar to the LeND and ISE cohorts. Patients in the Cambridge IMRT trial were exclusively treated with 40 Gy in 15 fractions. There was variation in use of boost between the different development cohorts and within the REQUITE multi-center cohort (see Methods). Although most other comorbidities and co-medications were similarly distributed, the proportion of smokers was higher in the non-UK cohorts, whereas the proportion of overweight patients (BMI ≥25) was higher in the UK cohorts.
Final logistic regression models for both toxicity endpoints following backwards elimination are shown in Table 5. At this stage, variables that satisfied the p < 0.1 stepwise inclusion threshold in the development cohorts for both endpoints were BMI, breast size, hypo-fractionation, use of boost, and smoking status. Tamoxifen use was associated with acute erythema only (OR 1.25, CI 1.05-1.26; Table 5). Age 50 and over was eliminated from both models, acute erythema (OR 1.17, 0.89-1.53, p = 0.253) and acute desquamation (OR 1.45, 0.83-2.53, p = 0.194). Table 6 shows apparent, optimism-corrected (after bootstrapping) and validation performance of both risk prediction models. After correcting for optimism, the final model for acute erythema discriminated patients with and without grade ≥2 erythema undergoing EBRT following BCS with an AUC of 0.645 (CI 0.619-0.667). Agreement between observed and predicted proportions was seen with a calibration slope of 1.0319. The final log odds of acute erythema could be calculated as −2.265 + 0.049 * BMI + 0.1 * breast_size -1.565 * hypo-fractionation + 0.302 * boost + 0.308 * smoking + 0.234 * tamoxifen. After re-calibrating the intercept, applying the final model to the REQUITE cohort gave a c-statistic (AUC) of 0.651 (CI 0.622-0.680), indicating the model performed equally well on validation, albeit with moderate calibration (slope = 0.665, 0.509-0.821) and a Brier score of 0.172 ( Table 6). The calibration plot demonstrates that the model slightly overpredicts the probability of acute erythema in the REQUITE validation cohort (Figure 1), with a mean predicted probability of 25.7% against an observed incidence of 22.8%.
The final model for acute desquamation developed in the joint RGC cohorts was able to discriminate patients with an optimismcorrected AUC of 0.847 (CI 0.817-0.873) and a calibration slope of 1.043. The log odds of acute desquamation could be calculated as −7.226 + 0.111 * BMI + 0.240 * breast_size -2.592 * hypofractionation + 0.606 * boost + 0.435 * smoking. Applying the final model to the REQUITE validation cohort with re-calibrated intercept, gave a c-statistic (AUC) of 0.697 (CI 0.658-0.737). This drop in AUC indicates relatively poorer discrimination performance, with equally poorer calibration (slope = 0.376, 0.260-0.492) ( Table 6). The model significantly under-predicts the probability of acute desquamation in the REQUITE cohort, with a mean predicted probability of 3.0% against and observed incidence of 9.3% (Figure 2). The Brier score was 0.085.

DISCUSSION
The aim of this study was to develop and validate predictive models for acute skin erythema and acute desquamation following whole-breast external beam radiotherapy and breastconserving surgery for breast cancer, which could be used without the need for detailed radiation dosimetry, in order to allow clinicians to estimate toxicity risk at the time of breast cancer diagnosis and before any treatment is planned. Previous work in prostate cancer showed that dosimetric models for radiation toxicity can be improved by adding clinical and cotreatment risk factors (14,15).
The initial literature search of published predictors significantly associated with acute breast radiation toxicity in multivariate analysis confirmed a number of variables including BMI, breast size or volume, hypo-fractionation (protective), boost and tamoxifen use, and smoking status. Variables relating to BMI and breast size or volume have been most frequently reported in previous smaller cohorts ( Table 3) as well as published randomized clinical trials (19,47). Moreover, both aforementioned trials highlighted breast volume as a stand-alone predictor of acute radiation toxicity independent of dose inhomogeneity. Interestingly, none of the previous publications assessed breast dose as predictor in itself, only fractionation schedule. However, findings from the UK breast hypo-fractionation trials and radiobiology have shown that acute toxicity is related to total breast dose (48), not dose per  fraction as for late toxicity (7). The protective association with hypo-fractionation reported in the literature is therefore likely due to the reduction in total dose for safe hypo-fractionation. Results of the literature search did not confirm an association with acute breast radiation toxicity for the predictors diabetes, cardiovascular disease and hypertension, whereas in the past radiation sensitivity has at least in part been attributed to the presence of cardiovascular disease or diabetes mellitus, which affects the microvasculature (49). However, it is likely that many patients enrolled in the reported cohorts were also on some form of anti-diabetic agent or a statin. Radioprotective effects of both metformin and gliclazide on human cells have been reported at least in vitro (50,51), and there is evidence that statins may accelerate DNA repair (52) and reduce the expression of pro-inflammatory cytokines (53).
Although several studies have investigated the association of acute breast toxicity with clinical and treatment factors, to date, none have produced a clinical prediction model. Populationbased measures of toxicity risk may not accurately reflect risk for an individual patient, but accurate prediction models can inform patients and clinicians about the future course of their condition or illness, thereby helping guide decisions about treatment. For a prediction model to be valuable, it should not only have predictive ability in the development cohort but must also perform well in a validation cohort. In the present study, the model to predict the risk of acute erythema following breast radiotherapy across RGC cohorts performed moderately well in the RGC cohorts and equally in the external REQUITE validation cohort with an AUC of 0.65, while calibration showed moderate agreement between predicted and observed toxicity outcomes in the validation cohort. On the other hand, performance of the model to predict the risk of acute desquamation following breast radiotherapy decreased relatively more in the external validation cohort (AUC = 0.70) than expected from internal validation (optimism-corrected AUC = 0.85), with relatively poor calibration.
Reasons why a predictive model may perform substantially differently between development and validation cohorts include over-fitting, missing important predictor variables, measurement errors of predictors, or differences in the patient cohort case mix. Measurement errors can arise from inter-observer variability across different cohorts and centers as well as use of different scales and time points to assess acute toxicity endpoints. Acute toxicity in both the Cambridge IMRT trial  and the REQUITE study was assessed in the final week of radiotherapy. The acute reaction may not peak until 1-2 weeks after the end of treatment and hence could have been missed in some patients, although this would not have been the case in the ISE cohort study in which patients were assessed at the end and 6 weeks after the end of treatment. In the LeND study, acute toxicity was coded retrospectively from the medical notes, which may have led to bias if the original documentation was unclear. In both LeND and the Cambridge IMRT cohorts, toxicity was assessed using the RTOG scale, which does not separate out patients with oedema and might in part explain the lower proportion of cases with grade ≥2 erythema in these cohorts compared to the ISE cohort, although the proportion of cases in the REQUITE validation cohort is more similar to that of the LeND and Cambridge cohorts. The ability to detect and grade skin changes is also dependent on skin tone, which is not readily captured in the RTOG and CTCAE scales.
The model developed across the three RGC cohorts in this study included clinically relevant predictors which satisfied the relatively loose criteria for inclusion in the model (p < 0.1). The purpose of multivariate prediction modeling is estimation rather than testing for association with risk factors, and it may therefore be reasonable to include clinical predictors despite nonsignificant association or collinearity, to ensure that important predictors are not missed (54). In order to address overfitting and to correct for optimism, bootstrapping was used as internal validation technique, but other studies with reasonably large datasets have used split-sample training-validation or cross-validation (55). It is possible but not very likely that a different internal validation method may have produced different results to the bootstrapping method used in the present study.
The value of the c-statistic (AUC) depends not only on the model of interest but also the even distribution of predictor and endpoint variables within a given patient population. Many radiotherapy patients present with a similar constellation of demographics and co-morbidities. They are also treated with similar plans, making discrimination a difficult task. In this study, acute desquamation was a relatively rare event in both the development and validation cohorts (8.7 and 9.3%), while the distribution of acute erythema within each dataset was somewhat more balanced toward cases (45.0 and 22.9%).
The distribution of clinical predictors between cohorts was broadly similar between the three development and validation cohorts, apart from smoking status and BMI, and none of the patients in the ISE cohort received chemotherapy. Overall, the distribution of clinical predictors was also similar to other previously published cohorts (21,24,25). Nevertheless, there was considerable heterogeneity between the centers within REQUITE with regards to treatment variables, such as dose fractionation, use of boost, and inclusion of patients who received prior adjuvant chemotherapy, as well as observed toxicity frequencies (39). Differences in radiotherapy techniques over time may have also affected generalizability of the prediction models to the validation cohort, as the patients in the three RGC development cohorts were on average treated more than 10 years before those enrolled in REQUITE. Certainly, there has been widespread update of intensity-modulated radiotherapy (IMRT) over that time, with almost 50% of patients enrolled in REQUITE treated in this way, whereas only some patients in the Cambridge trial cohort were randomized to IMRT and none of the patients in LeND and ISE received IMRT. Because of this and lack of data from previous literature, radiotherapy technique, such as IMRT, was not included in the model development phase.
A mixed modeling (GLMM) approach was chosen in this study to try and address issues of cohort heterogeneity and to relax the assumptions of independence of predictor variables. Using an alternative statistical method such as Lasso techniques, or data mining such as machine learning algorithms, may have identified other predictor variables or potential interactions in patients with several marginal risk factors (56). Machine learning algorithms are used with increasing frequency, in particular in the context of multi-dimensional "big data" such as electronic health records and radiotherapy imaging (57). However, the data available for this study, in particular from the slightly older RGC cohorts, were somewhat more limited and did not reach the multi-dimensionality usually associated with machine learning projects.
Despite these limitations, it is important to note that validation of the predictive model for acute erythema was achieved in the absence of detailed radiation dosimetry and notwithstanding the differences in radiotherapy techniques between treatment centers and countries particularly in the REQUITE cohort. The performance of this model across different cohorts in this study suggests that these findings are reproducible and generalizable beyond that of the original development dataset, whilst acknowledging the tendency for the model to overpredict in the external REQUITE cohort. The calibration plot demonstrates that the model can successfully identify high-risk patients and observed vs. expected rates were still correlated. This suggests that the model for acute erythema will simply benefit from further re-calibration of certain variable coefficients without redesigning the model from scratch. In the case of acute desquamation, further improvements using shrinkage and recalibration would not affect the model's reduced discriminatory power in the validation cohort. To improve discrimination, the model would need to be revised, for example, by additional adjustment to regression coefficients of predictors with different strength or direction of effect in the RGC development compared to the REQUITE cohorts, stepwise selection of additional predictors, such as those relating to radiotherapy technique (e.g., IMRT), or re-estimation of all regression coefficients in the validation population. These approaches to update the model need to be balanced against the fact that the information in the original model would be neglected and would require further validation elsewhere.
To increase clinical relevance, novel performance measures such as net re-classification improvement (NRI) and net benefit (NB) could also be considered (58). Risk models without recommending clinical decisions are less likely to change treatment decision-making behavior than those that translate risk into a treatment decision recommendation (59). Nevertheless, the clinical risk model presented here without detailed radiation dosimetry can be used in practice relatively simply to predict a patient's probability of acute skin radiation toxicity at the time of breast cancer diagnosis, which can then be taken into account when discussion various treatment options with patients.

CONCLUSIONS
While most published prediction research in the field of local breast cancer treatment toxicity continues to focus solely on model development, this study reports development and external validation of a predictive model for acute erythema following radiotherapy after breast-conserving surgery, which retained its moderate discriminatory power but will benefit from further re-calibration. A similar model to predict acute desquamation using clinical risk factors failed to validate in the REQUITE cohort. While other statistical or machine learning techniques may improve the performance of clinical risk models in the future, more accurate predictions are expected through the addition of genetic markers. This information could be considered when discussing breast cancer treatment options at the outset in particular with patients predicted at high risk of radiation toxicity.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Manchester North West UK NRES Approval 14/NW/0035. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
TRat conceived the study design and wrote the first draft of the paper. TRat and PS analyzed the data. TRat, PS, JC-C, TRan, RS, CW, and CT contributed to the interpretation of the data. CW is lead chief investigator and CT is deputy lead of the REQUITE study. JC-C is chief investigator of the ISE study, CEC is chief investigator of the Cambridge IMRT trial. RS is chief investigator of the LeND study. TRat, MA-B, MA, DA, GB, RB, JC-C, CEC, M-PF, KJ, GP, TRan, VR, BR, DD, MdS, ES, RS, BT-V, RV, and LV contributed patients to the participating studies. HS is the breast cancer patient advocate on the REQUITE study. AM and AW curated the database for the REQUITE study. All authors commented on and approved the final manuscript.